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ABSTRACT 

A survey of the literature of instructional 
evaluation, highlighting appropriate methods for encouraging, 
assessing, and documenting effective higher education English 
instruction, can aid English departments in search of valid measures 
of teaching effectiveness. Before a department can formalize any 
system of assessment, it must first establish some consensus about 
what constitutes good teaching based on the proportional emphasis 
assigned to each of the following areas: content expertise, 
instructional delivery skills, and instructional design skills. 
Although data relating to each of these instructional roles may come 
from a variety of sources, no single source is appropriate for 
assessing a teacher's effectiveness in all three. Student evaluation, 
peer evaluation, and self-evaluation all have strengths and 
weaknesses when used to evaluate teaching effectiveness. Two 
strategies for the development of effective assessment procedures 
are: (l) compilation of a current anthology of evaluative instruments 
and procedures validated within specific settings; and (2) 
establishment of a corpus of research in the area of teaching 
assessment by scholars in English study. (One figure and two notes 
are included; 41 references are attached.) (RS) 
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Christopher r,oulcl 

Converting Faculty Assessment into Faculty Development: 
The Director of Composition's Responsibility to Probationary Faculty 

The Cheney Report, a strident rebuke to higher education, is 
seasoned with the acerbic observations of disillusioned 
academics. A professor of classics declares that "universities 
are as uncongenial to teaching as the Mojave Desert to a clutch 
of Druid priests." An historian finds his colleagues to be "in 
full flight from teaching" and adds that "[i]n many universities, 
faculty members make no bones about the fact that students are 
the enemy" (qtd. in "'Research'" A22). Though measured and 
restrained by comparison, the recent report of the Carnegie 
Foundation for the Advancement of Teaching lends further support 
to the popular belief that college professors teach poorly and 
half-hearted! y.^ 

Apprehensive that current trends could possibly transform 
the conventional academic reward system, university 
administrators are showing renewed interest in the assessment of 
teaching effectiveness. For English departments, usually alert 
to periodic demands for better teaching, the scenario is 
familiar. Responding to the accountability movement of the 
sixties and seventies, the Modern Language Association published 
a survey of evaluation practices in college and university 
English departments throughout the United States and Canada. 
Seeking "to differentiate the art of evaluating teaching in 
English from the art of evaluating instruction in other 
subjects," Richard Larson, the author of this survey. 
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acknowledged that he could discern "no bases or procedures for 
evaluation intended solely for courses in English" (6). Larson's 
disappointment is echoed in the findings of a concurrent, though 
less focused, survey conducted by Thomas Wilcox and summarized in 
a volume titled The Anatomy of College English : 

The college teacher is seldom or never observed at his 
work except by those on whom he is working, and the 
effects of his efforts cannot be assessed. Those who 
are charged with judging his competence as a teacher 
must therefore gather their evidence by indirect, 
imperfect means, and often the evidence they acquire is 
of dubious validity. (29) 
Further corroboration can be found in the 1965 report of an ad 
hoc faculty committee at Yale, which concluded that "the problem 
of evaluating teaching is one for which no solution seems 
altogether satisfactory" (qtd. in Wilcox 37). This frustrating 
state of affairs underscored, in Larson's words, a "pressing need 
. . . for discussion of exactly what we think of as 'good 
teaching' in our subject" (6). 

Today, twenty years after the publication of these reports, 
English departments still have not reached any firm consensus 
regarding the measurement of effective teaching. Or, if they 
have, that consensus has not been set forth in any widely 
published guidelines or recommended procedures. A review of ERIC 
documents indexed since 1970 supports this conclusion: 
permutations similar to the one under which Larson's report is 
Indexed ("College English" or "English Departments" or "English 
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Instruction" and "Teacher Evaluation") bring forth sixty-five 
documents, twenty-two of which are concerned, at least to some 
extent, with higher education. Of these, only four have been 
published since 1983; another six appeared in print between 1976 
and 1982; and the remaining twelve predate 1976. These numbers 
suggest that interest in the evaluation of teaching, at least as 
a topic of research and professional debate, has declined over 
the past two decades. 

A closer look at the twenty-two documents shows that only 
three titles (excluding the titles by Larson and Wilcox already 
cited) address comprehensive methods of evaluating instructional 
effectiveness. One of these, an article by Kenneth Eble in the 
APE Bulletin , is confined to fairly broad generalities: data must 
be gathered in a consistent manner; these data should be clear 
and intelligible; tenured professors should strive to remain 
informed about the teaching practices of junior faculty. The two 
other sources — a "Staff room Interchange" item from CCC and a 
doctoral dissertation — will be discussed hereafter. 

Given this dearth of resources specific to the discipline, 
English departments in search of valid measures of teaching 
effectiveness must turn to the more substantial database provided 
by educational research. There are, however, understandable 
misgivings about the applicability of methods and standards that 
may be more appropriate to fields in which learning is defined as 
the acquisition and retention of subject matter. It is my aim, 
therefore, to survey the literature of instructional eva'.uation. 
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highlighting appropriate methods for encouraging, assessing, and 
documenting effective English instruction. 

This survey should offer several practical benefits. First, 
it provides guidance for departmental administrators in colleges 
and universities that stress teaching. Departments that have 
evaluative instruments and systems in place can compare their 
practices with the findings and recommendations of research; 
departments that have yet to formalize a comprehensive system 
will find guidelines for initiating one. Also, this survey 
presents methods whereby senior faculty in research universities 
can document the aptitudes and professional development of 
graduate students who seek entry-level jobs in teaching 
institutions. Finally, I hope to reintroduce the topic of 
instructional evaluation as a serious professional concern in 
English study. I shall therefore conclude with recommendations 
for research and an appeal for a published compilation of 
successful evaluative practices employed by college and 
university English departments throughout the country. 

I want to preface this survey with two important 
observations. First, authorities in the field of instructional 
evaluation agree that data used in a formative context (to 
improve teaching) must be carefully distinguished from data used 
for summative purposes (to evaluate faculty members, usually for 
tenure and promotion). Since departments can, as a rule, 
exercise greater latitude in gathering the former (Cohen and 
McKeachie 147), I shall keep this distinction in view, 
identifying those methods of assessment that are less appropriate 
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Data-Gathering Specification Matrix 
Role Sources 



TEACHING ! STUDENTS 


! PEERS 


! SELF 


1) Instructional | Questionnaire 
Delivery Skills | 




Self Report 
or 

Questionnaire 


2) Instructional | Questionnaire 
Design Ski lis I 


Peer 
Review of 
Materials 


Self Report 


3) Content | 

Expertise { 


Peer 
Anal ysis 
of Course 
Content 


Self Report 



Student Evaluation . Despite disagreements over their 
validity, student surveys have become a stable component of 
faculty evaluation in most colleges and universities. These 
surveys are used for assessing instructional design and delivery. 

Hundreds of books, chapters, and articles have examined the 
design and validation of student questionnaires. Cashin (Student 
Ratings) provides a useful survey of the current professional 
consensus that arises from this array of research and 
scholarship. A particularly useful and compendious resource for 
departments wishing to introduce or reexamine a system of student 
evaluation is the "Student Rating Form Selection and Development 
Kit" developed by Aleamoni and Arreola. Rating forms are also 
indexed and reviewed in the finnual volumes of Mental Measurement 
Yearbook and Tests in Print : as newer forms are developed and 
tested, announcements and reviews are published in three leading 
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professional journals: Instructional Evaluation . J^ournal of 
Educational Measurement , and Measurement News . 

Although most departments use information gathered from 
student surveys to evaluate instructors, there are also ways of 
using it to improve teaching. Cohen and Herr report some success 
in this regard when surveys are taken at midterm and the results 
are made immediately available. Aleamoni ("Usefulness"), 
however, has found that student evaluation is more likely to lead 
to improved teaching when accompanied by personal consultation 
with a peer or other resource person. Some experts favor the 
"cafeteria-style" survey form, which allows instructors to target 
specific items from a long list of criteria, although Cashin 
("Assessing" 98) argues that this is inappropriate for summative 
evaluation. Cooper has designed a nine-step process — involving 
class visitation, a midterm student survey, and videotaping — 
whereby this consultation can be regularized and made effective. 
Gil, arguing that student evaluation without professional 
consultation will not affect teaching, arrives at the intuitively 
evident conclusion that teachers, like students, benefit more 
from positive than from negative feedback. Finally, a department 
that simply cannot commit the necessary resources to professional 
consultation can at least try to identify the classroom behaviors 
privileged by the student questionnaire it uses. (Murray has 
found that, in a very general sense, student surveys tend to 
valorize leadership, extraversion , objectivity, and lack of 
anxiety . ) 
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Two controversial methods of gathering student data involve 
surveys of former students (see Smith and Nason) and evaluation 
of learning. The latter method, which entails some type of 
testing, is, according to Centra, "practical only in multi- 
section courses to which students [have] been preferably randomly 
assigned, and which employ a common examination that was unknown 
to the instructors (to avoid teaching to the test)" ("Colleagues" 
336). Freshman composition offers an obvious opportunity for 
this type of evaluation, although the dangers of a poorly 
designed or inadequately tested approach are considerable. 

In short, most authorities agree that student evaluation is 
at least as valid as other measures of teaching effectiveness and 
probably is the single most effective measure of instructional 
delivery. Alone, however, it is not an adequate means of 
evaluating teachers, and there are ways to maximize its validity 
as a summative instrument as well as to enhance its usefulness as 
a formative one. 

Peer Evaluation . Faculty peers are better qualified than 
students to assess content expertise and are also well suited to 
evaluate instructional design. More specifically, colleagues are 
best qualified to evaluate the following: mastery and selection 
of course content, course structure and objectives, 
appropriateness of assignments and other course materials, 
currency of instructional methodology, commitment to teaching, 
and support of the departmental mission (Cohen and McKeachie 
148). Most of these criteria can be assessed by examining 
syllabi, printed assignments, grade reports, responses to student 
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work, and similar documents. Detailed questionnaires for 
recording such assessments have been developed by French-Lazovick 
(79-81) and Seldin ( Changing Practices 162). 

One area in which peer evaluation may be misused, however, 
is classroom observation. One researcher states unequivocally: 
"peer ratings based on visitation are so lacking in reliability 
that they are useless for summative purposes. This is true even 
when visitation is carried out in a very systematic way" (French- 
Lazovik 74). Proof can be found in a study by Ward, Clark, and 
Von Harrison, who used trained "covert" and "overt" evaluators to 
demonstrate that teaching effectiveness improved dramatically on 
those days when instructors knew they were being observed. 

Classroom visitation is, however, an appropriate method of 
formative assessment, though again the choice of instruments is 
crucial. In general, a good observation instrument is one that 
targets the same criteria as a department's student 
questionnaire. A wealth of examples can be found in Borich and 
Madden (150-76) and Simon and Boyer (107-682). More recently, 
observation instruments have been developed by Seldin ( Changing 
Practices 163-65); Sorcinelli (13); and Helling (150-54), whose 
form elicits only favorable and supportive responses. Although 
instruments vary in length, structure, and content, most 
observation models fea'uure a three-step process that includes a 
preliminary conference, class visitation, and a follow-up 
conference. Generally, observations are conducted by a three- or 
four-person committee, at least one member of which is nominated 
by the visited instructor. 
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Two sources that relate specifically to English study 
describe recommended techniques of peer evaluation. Garman's 
1971 doctoral dissertation sets forth a method whereby English 
departments can use a clinical supervisor as a resource person. 
William Woods describes a system of classroom observation used at 
Wichita State University. Although Woods's article addresses an 
urgent need, its usefulness is limited since it does not describe 
th<» observation instrument used, and, like most items in CCC's 
"Staffroom Interchange" column, it lacks documentation. 

Two other methods of peer evaluation are a seminar report 
delivered to an audience of colleagues and videotaping of 
classroom instruction (Lichty and Peterson). The former approach 
clearly provides an opportunity to demonstrate content expertise 
as well as skills involving instructional design. Videotaping, 
however, is more problematic: although it may eliminate some of 
the tension and artificiality inherent to the physical presence 
of three or four observers in the class, there is no reason to 
suppose that videotaped classes are any less subject to the 
distorting effects reported by Ward, Clark, and Von Harrison. 
Most experts treat videotaping as a method of self-evaluation, 
and it will therefore be discussed in in greater detail 
hereafter. 

Self-Evaluation . It should come as no surprise that self- 
evaluation is considered the least valid measure of teaching 
effectiveness, although surprisingly little research has been 
undertaken to prove its invalidity (Carroll 181). Some experts 
even doubt that self-evaluation can improve instruction (see, for 
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example, Seldin, "Self-Assessment" 71), though others view self- 
evaluation as an appropriate means of gathering formative data 
relative to the objectives, content, and organization of courses 
and to the instructor's ability to sustain interest and promote 
learning (Aleamoni, "Developing"). Carroll lends support to the 
latter view by noting that although self-ratings do not correlate 
highly with the ©valuations of students or peers, they "are 
particularly effective when they serve to identify for the 
instructor certain unexpected discrepancies with other ratings" 
(182). Centra adds that teachers are often able to identify 
their own strengths and weaknesses, "though they use only the 
positive end of a scale in doing so" ( Determining 49). In short, 
self-evaluation is best used in conjunction with other forms of 
assessment, and it can be a particularly effective means of 
getting teachers to confront discrepancies between self- 
perceptions and the perceptions of others, especially students. 

One promising method of self-evaluation involves the 
videotaping of class sessions, although experts have been careful 
to specify the circumstances under which it is most likely to be 
effective. Rezler and Anderson have found that instructors do 
not benefit unless they view the videotape with a trained 
consultant who can stop the videotape from time to time in order 
to draw attention to specific teaching behaviors. Salomon and 
McDonald have shown, further, that change occurs only when 
instructors can compare their videotaped behaviors with some 
model of effective teaching. Finally, Fuller and Manning provide 
a detailed set of guidelines for the use of videotaping for 
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faculty self-evaluation. Briefly, those guidelines are as 
fol lows: 

1. Videotaping should be done in a typical classroom. 

2. Playback should be confidential. 

3. Teacher and consultant should reach prior agreement 
regarding the behaviors that are to be observed. 

4. Subjects must be enthusiastic, self-critical, and 
open to change. 

5. Feedback should be authoritative, fair, and 
constructive. 

6. Consultants must be empathetic and non-judgmental 
but also assertive in calling attention to remediable 
weaknesses. (509) 

Regardless of the form it assumes in a particular 
department, self-evaluation should follow a standard structure. 
Seldin ("Self-Assessment" 72) suggests that instructors might 
simply be asked to respond to the same questionnaire as their 
students. Seldin also recommends several formats for self- 
reports ("Self-Assessment" 73-74, Changing Practices 167-73); 
others can be found in Carroll (185-90), Centra ( Determining 53- 
54), and Larson (101-02). Kindsvatter and Wilen provide several 
observation instruments for videotaped classes. Seldin describes 
the organization of a "teaching portfolio" in which an 
instructors can document their strengths and accomplishments 
("Academic" 14-16). 

Conclusions . Concluding his 1970 survey of teacher 
evaluation in English, Richard Larson declared: 

er|c ^ 4 
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[I]t remains clear that the art of evaluation, though 
it has been with us for many years, has by no means 
come to maturity. ... No one yet knows how to 
establish a dependable connection between an act or 
acts performed by an agent whom we call a teacher, and 
important changes on the persons — we call them 
"students" — who interact in some mysterious way wfth 
the teacher. (63-64) 
The preceding discussion provides a cursory view of some of the 
major developments in the field of instructional evaluation that 
have arisen since Larson delivered this sobering assessment. 
Although it is difficult to guess what a current survey of 
cingMsh departments might show, I do not sense that many have 
incorporated state-of-the-art techniques for developing and 
assessing effective teaching. Certainly there has not been any 
"paradigm shift" in this area comparable to the one chronicled by 
many observers of the English curriculum. Whereas the 
development and evaluation of student writing, for example, has 
been the subject of so much research that we now have an 
impressive "meta-analysis" to sort out this body of scholarship 
(Hillocks), only a handful of articles address the measurement of 
effective English teaching in higher education. Surely we owe 
our colleagues the same care, deliberation, and kindness that our 
students deserve whenever we examine and evaluate their work. 

Furthermore, the current demand for accountability is 
spurred not only by the reports of foundations and government 
agencies but also by the more demagogic appeals of books like The 
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Closing of the American Mind and Prof scam . These books, unlike 
their counterparts from the sixties and seventies, have targeted 
higher education. Especially worrisome is the possibility that, 
facing such pressure, colleges and universities could be called 
upon to document student learning through testing, just as 
primary and secondary schools in a number of states currently are 
expected to do. Since they do not define learning in terms of 
acquisition and retention, English departments are particularly 
vulnerable in this regard and have a clear stake in finding and 
defending more appropriate means of measuring good teaching. 

More encouraging is the growing recognition that there is no 
entirely appropriate one-size-fits-all method of evaluating 
faculty. Researchers are finding that student ratings correlate 
with academic field (Cashin, "Assessing" 94). An increasing 
number of institutions encourage faculty members to draft 
professional-growth contracts, permitting some variance in the 
way effective teaching is defined and measured (Seldin, "Self- 
Assessment" 73). A few colleges and universities have even 
established separate career tracks for teaching and research 
(Mooney A18). Presumably professors choosing to emphasize the 
former can devote more time to instructional improvement and will 
enjoy greater access to appropriate resources. As English 
faculty begin to exploit this pluralism, they may find it 
necessary to assert vigorously their qualifications to define and 
measure good teaching in a manner they deem appropriate to the 
discipl ine. 

er|c ^ 
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I propose two strategies. First, English departments might 
benefit from a current anthology of evaluative instruments and 
procedures validated within specific settings. Particularly 
useful would be any materials generated by English faculty in 
consultation with specialists in the field of instructional 
development. Second, scholars in English study might profitably 
establish their own corpus of research in the area of teaching 
assessment. Are we comfortably convinced, for example, that 
classroom observation by peers will not provide valid summative 
data in an English department? Isn't it possible that humanistic 
paradigms (e.g., the thick description of ethnography) provide 
richer sources of data than the empirical, often behaviorist, 
models of educationist research? The rationale for generating 
our own procedures and our own methods for validating them has 
been provided, ironically, by Lawrence Aleamoni , one of the 
leading researchers in the educationist tradition: 

[I]f the department has not established clear criteria 
and guidelines, then [others] will begin imposing their 
own standards. It is imperative that departmental 
faculty be aware that, if they do not develop their own 
standards, someone else will impose his or her own. 
("Some Practical Approar.hes" 76) 
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Notes 

^ Both reports call to mind the accountability mo*'ement and 
comparable trends from previous decades. Brought on by a complex 
array of socioeconomic causes, such trends have aroused brief 
periods of scrutiny without fundamentally altering a system of 
academic rewards thought to privilege research over teaching. 
There are, however, reasons to surmise that the current movement 
may lead to more lasting results. First, renewed emphasis of 
teaching comes at a time when faculty in liberal-arts colleges 
and comprehensive ("second-tier") universities are already under 
increasing pressure to publish. The result, in the words of one 
harried young professor, is that "You cut out all the 
contemplative stuff. You find yourself teaching hysterically and 
doing research hysterically" (qtd, in Heller A14). Also, 
demographic data suggest that colleges and universities will 
replace a disproportionate share of their faculties during the 
coming decade. Proponents of reform are therefore able to argue 
that the professional values of a generation of professors are at 
stake. Finally, economic hardships reinforce adisturbing appeal 
to consumers found in the Cheney Report: "When faculty members 
teach less, there is a financial consequence. Because more 
people must be hired to teach, the costs of education 
escalate — and so does tuition" (qtd. in "'Research'" A24). 

^According to Arreola, emphasis of content expertise defines 
a good teacher as one who provides the opportunity for learning; 
emphasis of instructional delivery skills defines a good teacher 
as one who encourages learning; emphasis of instructional design 
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skills defines a good teacher as one who causes learning. 
Obviously, no workable definition of good teaching should ignore 
any of these three areas. However, it should be clear that the 
first definition minimizes accountability, while the third 
definition accentuates it, possibly to the point of ushering in 
student achievement tests as a valid measure of teaching 
effectiveness. 
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